Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Cascaded and low-consuming online method for large-scale Web page category acquisition
WANG Yaqiang, TANG Ming, ZENG Qin, TANG Dan, SHU Hongping
Journal of Computer Applications    2017, 37 (4): 924-927.   DOI: 10.11772/j.issn.1001-9081.2017.04.0924
Abstract537)      PDF (847KB)(538)       Save
To balance the contradiction between accuracy and resource cost during constructing an automatic system for collecting massive well-classified Web pages, a cascaded and low-consuming online method for large-scale Web page category acquisition was proposed, which utilizes a cascaded strategy to integrate online and offline Web page classifiers so as to take full of use of their advantages. An online Web page classifier trained by features in the anchor text was used as the first-level classifier, and then the confidence of the classification results was computed by the information entropy of the posterior probability. The second-level classifier was triggered when the confidence is larger than the predefined threshold obtained by Multi-Objective Particle Swarm Optimization (MOPSO). The features were extracted from the downloaded Web pages by the secondary classifier, then they were classified by an offline classifier pre-trained by Web pages. In the comparison experiments with single online classification and single offline classification, the proposed method dramatically increased the F1 measure of classification by 10.85% and 4.57% respectively. Moreover, compared with the single online classification, the efficiency of the proposed method did not decrease a lot (less than 30%), while the efficiency was improved about 70% compared with single offline classification. The results demonstrate that the proposed method not only has a more powerful classification ability, but also significantly reduces the computing overhead and bandwidth consumption.
Reference | Related Articles | Metrics